Skip to content

Conversation

@Pearl1594
Copy link
Contributor

@Pearl1594 Pearl1594 commented Nov 28, 2025

Description

This PR fixes: #12123

This pr prevents the scaling of the cluster by checking if the overall resources / vm count with the accounts resource limits to prevent cluster in an incorrect state.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

Before FIx:

Set account's resource limit for VM count to 3
Deployed a CKS cluster with 1 worker node and then attempted to scale it to 3 and it fails
image

it scales by 1 node but increases the overall size of the cluster to 4 and puts the cluster in an incorrect state

After fix:

Performed the same test as above, here it prevents scaling altogether as it pre-emptively calculates if the overall resources exceeds what's set for the account.

image

@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 0% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.56%. Comparing base (e23c7ef) to head (0688240).
⚠️ Report is 5 commits behind head on 4.22.

Files with missing lines Patch % Lines
...bernetes/cluster/KubernetesClusterManagerImpl.java 0.00% 33 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.22   #12167      +/-   ##
============================================
- Coverage     17.56%   17.56%   -0.01%     
- Complexity    15545    15546       +1     
============================================
  Files          5910     5910              
  Lines        529123   529164      +41     
  Branches      64627    64636       +9     
============================================
+ Hits          92937    92940       +3     
- Misses       425733   425771      +38     
  Partials      10453    10453              
Flag Coverage Δ
uitests 3.58% <ø> (ø)
unittests 18.63% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube Cloud

@weizhouapache
Copy link
Member

@Pearl1594
thanks for the fix

IMHO, resource limitation could be one of the reasons what cause CKS scaling to fail
do we need to consider other cases ? maybe change the process of scaling ?

}
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

Comment on lines +403 to +404
@Inject
ResourceLimitService resourceLimitService;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sonar warning here is new to me. If we need to deal with this that is going to be a huge effort as this pattern is all over the project.

@Pearl1594
Copy link
Contributor Author

@Pearl1594 thanks for the fix

IMHO, resource limitation could be one of the reasons what cause CKS scaling to fail do we need to consider other cases ? maybe change the process of scaling ?

I don't quite understand what you mean @weizhouapache - what issues and what change in process are we talking about. Could you please give some clarity on that. Thanks.

@weizhouapache
Copy link
Member

@Pearl1594 thanks for the fix
IMHO, resource limitation could be one of the reasons what cause CKS scaling to fail do we need to consider other cases ? maybe change the process of scaling ?

I don't quite understand what you mean @weizhouapache - what issues and what change in process are we talking about. Could you please give some clarity on that. Thanks.

I was thinking other possible edge cases, for example, the 2nd vm is not deployed due to the capacity of the system/storage.
never mind, current changes look good.

Copy link
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

not tested yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants